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ABSTRACT 

We consider density estimators based on the nearest neighbors method applied to discrete point 
distibutions in spaces of arbitrary dimensionality. If the density is constant, the volume of a hyper- 
sphere centered at a random location is proportional to the expected number of points falling within 
the hypersphere radius. The distance to the N-th nearest neighbor alone is then a sufficient statistic 
for the density. In the non-uniform case the proportionality is distorted. We model this distortion by 
normalizing hypersphere volumes to the largest one and expressing the resulting distribution in terms 
of the Legendre polynomials. Using Monte Carlo simulations we show that this approach can be 
used to effectively address the tradeoff between smoothing bias and estimator variance for sparsely 
sampled distributions. 
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1. Introduction 

Calculations based on proximity relations with nearest neighbors appear in a 
wide variety of astronomical problems. The distance to AMh nearest neighbor can 
be converted into a measure of density by means of simple inversion. Pioneer- 
ing uses of this technique in astronomy include von Hoerner (1963) and Dressier 
(1980). It is known (Casertano and Hut 1985) that such conversion biases density 
estimates by a factor of N/(N — 1) and increases their variance by N /{N — 2), 
where N is the number of considered nearest points. Density estimators based on 
N = 1 ox N = 2 are therefore of little use and at N = 4 half of the available in- 
formation is lost. Details of procedures applied in practice and the relative merits 
of various choices of N are reviewed by Schmeja (2011), Haas et al. (2012) and 
Muldrew et al. (2012). It turns out that the adopted value of N is typically between 
3 and 10. The most frequently used values are 3, 4, and 5, where the effect of 
diminishing accuracy is large. However, the investigators have a good reason to 
keep N small. When the density varies in space, its estimate based on the nearest 
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neighbors method is effectively an average over the volume set by the N-th nearest 
neighbor and differs from the local value. This smoothing bias is unavoidable in 
the case of variable density and independent of the rarely mentioned reciprocity 
bias described by Casertano and Hut (1985). Choosing small N limits the influ- 
ence of the smoothing bias for the price of increasing the variance. Another way of 
diminishing the smoothing bias was introduced by Ivezic et al. (2005) and Cowan 
and Ivezic (2008) who take a "Bayesian" approach to combine contributions from 
all N nearest neighbors. The net effect is, again, lower bias at the cost of increased 
variance. Here we propose a new method of dealing with the smoothing bias that 
captures the information on density variations contained in distances to all N near- 
est neighbors using the Legendre series expansion. 

2. Uniform Density Distribution 

In this section we rederive the formula for the mean density in the case of a uni- 
form point distribution. In our derivation we particularly emphasize an alternative, 
and in fact more natural, approach to the problem that adopts the volume per point 
instead of the mean density of points as the basic unknown. This also serves as an 
introduction to the non-uniform case described in the next section. 

Let us consider a metric space with an arbitrary number of dimensions. We 
will assume that the space is populated by randomly distributed pointlike objects 
in such a way that the expectation value of the number of objects n contained in an 
arbitrarily chosen subspace of volume v is proportional to the volume v with a pro- 
portionality constant po . So the expected number of objects contained in volume v 
is (n(v)) = pov, and po that can be defined as the density of our pointlike objects 
is the unknown to be found. An alternative treatment of our problem, which is the 
case of the nearest neighbors method, consists of finding the volume corresponding 
to a predefined number of points N. The expected value of volume (v) is then 
proportional to the number of points with a proportionality constant defined as 
the volume per point. So the expected volume over N points is (vjy) =(jN. In this 
case /j is the basic unknown. 

Let us fix the origin of cartesian coordinates at an arbitrarily chosen point and 
imagine a series of N hyperspheres centered on the origin such that only one point- 
like object resides on the surface of each hypersphere. The sequence is ordered 
according to increasing volume vi,V2, . . . ,Vfj. It is easy to see that the problem is 
identical to the well known case of events occuring randomly but at a constant aver- 
age rate (e.g. Eadie et al. 1982). Examples often presented in statistical textbooks 
are radioactive decay, telephone calls, or recording photons arriving from a faint 
astronomical object. In our case the volume plays the role of time. It is worthwhile 
to point out that the variables in any pair of v, values are not statistically indepen- 
dent because the central part of the larger hypersphere is identical with the smaller 
one. In order to deal with statistically independent observables we will consider 
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the first order differences of consecutive volumes 



Xi = Vj-V;_l i=l,2,...,iV (1) 

with v = 0. Random variables Xj wee mutually statistically independent and their 
probability distribution is exponential. Therefore, the corresponding probability 
density can be written as 

/(*,-) = -«"«. (2) 

For a uniform distribution of our pointlike objects all ^ values are identical and 
equal to ■ Therefore the joint probability density of all Xj is 

L = f(xi , x 2 , . . . , x N ) = -L«T * Ll 1 * (3) 
We can find a maximum likelihood estimator for the volume per point (/io) 

This estimator is based solely on the position of the most distant neighbor in the 
sample. Under the assumption of constant density, this estimator is sufficient and 
unbiased, i.e. it already includes all information on density contained in our sample. 
Exact positions of less distant neighbors are irrelevant. 

The probability density of the random variable vn follows the Gamma distri- 
bution 

(JV-l) 

with expectation value (vn) =^oN, variance c 2 (vn) =HqN, and standard deviation 
o(viv) =^0V^V- The corresponding values for the estimated value of n are (ji) =^0 , 
variance c 2 (fj) =^0/^ an ^ standard deviation oQi) = i uo/v / ^V- It is worth noting 
that the variance of the estimated volume per point /u is exactly equal the lowest 
possible value set by the sampling statistics because it is inversely proportional to 
the number of independent observations ,/V and the variance is defined for all values 
of ./V starting with N=l. 

The probability density of vn is given by Equation 5. We can treat all the re- 
maining volumes v, with i < N as random variables uniformly distributed between 
and vjy. We can now write the joint probability density of volumes v; for uniform 
point distributions 

at— 1 y 

L = /(Vl , V 2 , ■ ■ ■ , V N ) = f(v N ) ]^[ f(Vi I V N ) = f{v N ) (^yv- 1 ' 

where 1/vn is the conditional probability density of any point other than AMh 
given that the volume vn is fixed. 
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Casertano and Hut (1985) derived analogous formulae for the alternative case of 
point density estimation. They had to consider an inverse value of directly observ- 
able vn that led to the use of the inverse Gamma probability distribution instead of 
the Gamma distribution, and consequently to a loss of information. The estimator 
of density is 

vn 

and the variance expressed in terms of the estimated density is 

o 2 (p) = J^l. (8) 
VF; N-2 

Therefore the above approach should only be used for N >2. The variance of 
this estimator is larger than the sampling statistics limit, even drastically so for 
very small N. Evaluating density at a location coinciding with one of the pointlike 
objects is no different. Points at the origin of coordinates should not be counted. 



3. Non-uniform Density Distribution 



The optimal properties of the vn estimator degrade for density profiles with 
progressively larger deviations from a uniform distribution. The smoothing bias is 
increasing. In addition, the random variables are no longer mutually statistically 
independent and now the exact positions of less distant neighbors, normalized to 
the value of vn, carry information that can be used to limit the influence of smooth- 
ing bias. An approximation formula based on power series expansion is a natural 
choice and coefficients can be estimated using least squares. A less complicated 
approach is to use orthogonal functions constructed from the power series of the 
same order. In the latter case the unknown coefficients can be determined by con- 
volutions, which is both simpler and faster. 

Let us consider the sequence of normalized volumes y, ■ = vi/vn with the excep- 
tion of the last element taken as normalization. Spherical volumes in D dimensions 
are computed as v, = rf% D l 2 /T(D/2 + 1) , where r,- is the distance to the i-th near- 
est neighbor. In the uniform case vi, V2, . . . , vn-\ are uniformly distributed over 
the interval (0,vjv)- Observed deviations from the uniform distribution can tell us 
something about the density distribution inside the volume defined by the iV-th 
point. We will search for this something with the help of the Legendre polynomial 
expansion of order k . In the following derivation we make use of the shifted Leg- 
endre polynomials Pi(y) = Pi(2y — 1) which are orthogonal on the interval (0,1) 
and are obtained from the regular Legendre functions Pi defined over the interval 
[—1,1]. The observables can be expressed with the help of the Dirac delta function 

P(y) = -I 1 5(y-y/) ! (9) 

VN i=l 
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and in this form are ready to be convolved with the shifted Legendre functions. The 
first few basis functions are 

P (y) = 1 (10) 

P 1 (y)=2y-l (11) 

P 2 (y) = 6y 2 -6y+l (12) 

P 3 (y) =20y 3 -30y 2 + \2y-\. (13) 



A convolution with the / -th term yields the corresponding expansion coefficient 

P/ = (2/ + l) f\{y)P l {y)dy=' 2 ^-\p l {y i ) (14) 

Jo v N fr[ 

and the resulting interpolation formula for density is 

1=0 

The extrapolated value of central density expressed in points per unit volume is 

p N , k = p (y) \ y =o = — N f I ( 2l + 1 )Pi (yi) p i (0) • (16) 
V N i=i ; =0 

Using the fact that Pi(0) = (— 1)' and returning to regular Legendre polynomials P/ 
we obtain the final formula for the estimator 

PN,k = - N f J t(-l) l (2l + \)P l (2 yi - 1 ) , (17) 

V N i=l 1=0 

which is convenient to evaluate numerically as the second sum is the value of the 
regular Legendre series of order k with fixed coefficients taken at 2yi — 1 . For &=0 
we recover the original A^-th nearest neighbor estimator p#,o = (N — 1)/vjv - 

The simple method presented above samples the orthogonal basis functions at 
— 1 points ignoring their shape over the rest of the interval. We can imagine an- 
other approach that utilizes all information in basis vectors, e.g. by integrating the 
Legendre polynomials between pairs of consecutive data points. However, simula- 
tions performed using this alternative method have not improved the final accuracy 
of density estimates. 



4. Monte Carlo Experiments 

With the help of Monte Carlo simulations we can further investigate the prop- 
erties of our new estimator. In Figure 1 we compare the bias and standard deviation 
of three estimators: 1) AMh nearest neighbor (hereafter NT), 2) Ivezic et al. (2005) 
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Fig. 1 . Performance of three density estimators based on the nearest neighbors method applied to a 
uniform two-dimensional density distribution: 1) N-th nearest neighbor method (NT), 2) prescrip- 
tion of Ivezic et al. (2005) (105), and 3) our new algorithm based on interpolation of N nearest 
neighbors (NN) with density variations turned off (£ = 0). Standard deviation (solid lines) and bias 
(dashed lines) were calculated as a function of N by averaging the results of 10 4 independent trials. 
Theoretical noise limit of the inverse volume estimator is shown as the thick black curve. 



(105), and 3) our N nearest neighbors estimator from Section 3 (NN) with k = 0. 
The NN algorithm with k = is mathematically equivalent to NT estimator. The 
experiment consists of 10 4 realizations of a uniform two-dimensional density field. 
Each algorithm is applied to the same data with 3 <N< 10. All three estimators are 
unbiased ( (n) /n~ 1 ) and follow the theoretical variance curve. The performance of 
the 105 algorithm in this case is essentially the same as for the other two methods. 
The standard deviation of higher order NN estimators is shown in Figure 2. The 
number of nearest neighbors N varies between the lowest possible value (k + 3 ) 
and 30. Figure 3 demonstrates how the same three estimators handle non-uniform 
density distributions and smoothing bias. On top of the uniform two-dimensional 
density field we now include a Gaussian peak that doubles the surface density of 
points at the maximum. As before, we run 10 4 density estimates at the location of 
the peak. The overdensity at the center is sparsely sampled with only 10 data points 
drawn from the two-dimensional normal distribution. The bias given as (n)/n is 
shown for k + 3 < N < 30 . As N increases, the estimators effectively average input 
data over larger areas. Again, the NN k = case is just the A^-th nearest neighbor 
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Fig. 2. Accuracy of our modified density estimator based on the nearest neighbors method applied to 
a uniform two-dimensional density distribution. Standard deviation was calculated as a function of 
the number of nearest neighbors N by averaging the results of 10 4 independent trials. The variance 
of the estimate degrades for methods of order k > 0. Theoretical noise limit of the inverse volume 
estimator is shown as the thick black curve. 

algorithm. The estimator of Ivezic et al. (2005) can absorb some bias at the cost of 
increased variance. Our new method is quite efficient in removing the smoothing 
bias and offers some flexibility in handling the tradeoff between the bias and the 
variance of the estimator. The second order estimate is practically unbiased in this 
test. 



5. Conclusions 



Using the nearest neighbors method we obtain N independent observables x, 
with which to estimate the density. We can treat them as N information units as 
long as they are used to measure the volume per data point. Inverting the observ- 
ables and considering them as measures of density effectively lowers the number of 
information units to N — 2. Consequently, the variance of the estimator increases 
by N/(N — 2) . This excludes 1 and 2 as the allowed values of N. The information 
content of the observables jc, cannot be increased by transforming them. 

In the non-uniform case the accuracy of our results is also affected by the 
smoothing bias, in addition to the basic limitation due to the number of degrees 
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Fig. 3. Comparison of the bias introduced by various density estimators based on the nearest neigh- 
bors method applied to a non-uniform two-dimensional density distribution. The simulated density 
field consists of 10 points drawn from a Gaussian distibution on top of a uniform background that 
doubles the density at the maximum. The lines show the ratio of the estimated density and true den- 
sity at the peak as a function of the number of nearest neighbors N , and were calculated by averaging 
the results of 10 4 independent trials. 



of freedom. The resulting density estimate is the average density inside the volume 
defined by the most distant point in the sample and not the density at the chosen 
center. The smoothing bias increases with increasing number of neighboring points 
N supplied to the estimator. Therefore, the choice of Af is a tradeoff between the 
accuracy and the smoothing bias. 

In Section 3 we used the distribution of distances to N nearest neighbors to 
"fit" a simple interpolation formula that captures local density variations around an 
arbitrary center. A density estimate at the center is then obtained by extrapolating 
this formula to zero distance. And it is this extrapolation that is responsible for a 
large increase in variance as higher order terms are included in the density profile. 
However, increasing k allows one to use larger N while maintaining control over 
smoothing bias, which results in more accurate density estimates. The best values 
of N and k for a particular application may be selected with the help of Monte 
Carlo experiments. 
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